333 research outputs found

    Bounds on Query Convergence

    Get PDF
    The problem of finding an optimum using noisy evaluations of a smooth cost function arises in many contexts, including economics, business, medicine, experiment design, and foraging theory. We derive an asymptotic bound E[ (x_t - x*)^2 ] >= O(1/sqrt(t)) on the rate of convergence of a sequence (x_0, x_1, >...) generated by an unbiased feedback process observing noisy evaluations of an unknown quadratic function maximised at x*. The bound is tight, as the proof leads to a simple algorithm which meets it. We further establish a bound on the total regret, E[ sum_{i=1..t} (x_i - x*)^2 ] >= O(sqrt(t)) These bounds may impose practical limitations on an agent's performance, as O(eps^-4) queries are made before the queries converge to x* with eps accuracy.Comment: 6 pages, 2 figure

    Automatic Differentiation of Algorithms for Machine Learning

    Get PDF
    Automatic differentiation---the mechanical transformation of numeric computer programs to calculate derivatives efficiently and accurately---dates to the origin of the computer age. Reverse mode automatic differentiation both antedates and generalizes the method of backwards propagation of errors used in machine learning. Despite this, practitioners in a variety of fields, including machine learning, have been little influenced by automatic differentiation, and make scant use of available tools. Here we review the technique of automatic differentiation, describe its two main modes, and explain how it can benefit machine learning practitioners. To reach the widest possible audience our treatment assumes only elementary differential calculus, and does not assume any knowledge of linear algebra.Comment: 7 pages, 1 figur

    Soft-LOST: EM on a Mixture of Oriented Lines

    Get PDF
    Robust clustering of data into overlapping linear subspaces is a common problem. Here we consider one-dimensional subspaces that cross the origin. This problem arises in blind source separation, where the subspaces correspond directly to columns of a mixing matrix. We present an algorithm that identifies these subspaces using an EM procedure, where the E-step calculates posterior probabilities assigning data points to lines and the M-step repositions the lines to match the points assigned to them. This method, combined with a transformation into a sparse domain and an L1-norm optimisation, constitutes a blind source separation algorithm for the under-determined case

    An Analysis of Publication Venues for Automatic Differentiation Research

    Get PDF
    We present the results of our analysis of publication venues for papers on automatic differentiation (AD), covering academic journals and conference proceedings. Our data are collected from the AD publications database maintained by the autodiff.org community website. The database is purpose-built for the AD field and is expanding via submissions by AD researchers. Therefore, it provides a relatively noise-free list of publications relating to the field. However, it does include noise in the form of variant spellings of journal and conference names. We handle this by manually correcting and merging these variants under the official names of corresponding venues. We also share the raw data we get after these corrections.Comment: 6 pages, 3 figure

    Simplifying Neural Network Soft Weight-sharing Measures by Soft Weight-measure Soft Weight Sharing

    Get PDF
    The abstract is included in the text

    Gradient Descent: Second-Order Momentum and Saturating Error

    Get PDF
    Batch gradient descent, ~w(t) = -7JdE/dw(t) , conver~es to a minimum of quadratic form with a time constant no better than '4Amax/ Amin where Amin and Amax are the minimum and maximum eigenvalues of the Hessian matrix of E with respect to w. It was recently shown that adding a momentum term ~w(t) = -7JdE/dw(t) + Q'~w(t - 1) improves this to ~ VAmax/ Amin, although only in the batch case. Here we show that secondorder momentum, ~w(t) = -7JdE/dw(t) + Q'~w(t -1) + (3~w(t - 2), can lower this no further. We then regard gradient descent with momentum as a dynamic system and explore a non quadratic error surface, showing that saturation of the error accounts for a variety of effects observed in simulations and justifies some popular heuristics

    Comments on 'Hebbian learning is jointly controlled by electrotonic and input structure

    Get PDF
    It is argued that simulations presented by Tsai, Camevale and Bmwn do not agree with their theoretical predictions and that their mathematical dedvation contains a major flaw. The origin of these misunderstandings is haced to the application of a special rase of an equation whose general version is given here

    Soft-LOST: EM on a Mixture of Oriented Lines

    Get PDF
    Robust clustering of data into overlapping linear subspaces is a common problem. Here we consider one-dimensional subspaces that cross the origin. This problem arises in blind source separation, where the subspaces correspond directly to columns of a mixing matrix. We present an algorithm that identifies these subspaces using an EM procedure, where the E-step calculates posterior probabilities assigning data points to lines and the M-step repositions the lines to match the points assigned to them. This method, combined with a transformation into a sparse domain and an L1-norm optimisation, constitutes a blind source separation algorithm for the under-determined case

    AD in Fortran, Part 2: Implementation via Prepreprocessor

    Get PDF
    We describe an implementation of the Farfel Fortran AD extensions. These extensions integrate forward and reverse AD directly into the programming model, with attendant benefits to flexibility, modularity, and ease of use. The implementation we describe is a "prepreprocessor" that generates input to existing Fortran-based AD tools. In essence, blocks of code which are targeted for AD by Farfel constructs are put into subprograms which capture their lexical variable context, and these are closure-converted into top-level subprograms and specialized to eliminate EXTERNAL arguments, rendering them amenable to existing AD preprocessors, which are then invoked, possibly repeatedly if the AD is nested
    • …
    corecore